Robust Phoneme Recognition Using High Resolution Temporal Envelopes
نویسندگان
چکیده
Frequency domain linear prediction (FDLP) is a technique for auto-regressive (AR) modeling of Hilbert envelopes of the signal. The model is derived by the application of linear prediction on the discrete cosine transform (DCT) of the signal. In this paper, we propose modifications of the basic FDLP approach for deriving high resolution envelopes. We determine various factors which affect temporal resolution in FDLP such as the location of the input peaks within the analysis segment, type of window applied in the DCT of the signal, and order of the FDLP model. This analysis enables us to improve the resolution of temporal envelopes derived from FDLP. The features extracted from high resolution envelopes outperform MFCC features in noisy phoneme recognition experiments (relative improvements of 10 %) and phoneme recognition in conversational telephone speech (relative improvements of 5 %).
منابع مشابه
Temporal resolution analysis in frequency domain linear prediction.
Frequency domain linear prediction (FDLP) is a technique for auto-regressive modeling of Hilbert envelopes. In this letter, the resolution properties of the FDLP model are investigated using synthetic signals with impulses immersed in noise. The effect of various factors are studied which affect the temporal resolution and this analysis suggests ways to improve the resolution of the FDLP envelo...
متن کاملHilbert envelope based spectro-temporal features for phoneme recognition in telephone speech
In this paper, we present a spectro-temporal feature extraction technique using sub-band Hilbert envelopes of relatively long segments of speech signal. Hilbert envelopes of the sub-bands are estimated using Frequency Domain Linear Prediction (FDLP). Spectral features are derived by integrating the sub-band Hilbert envelopes in short-term frames and the temporal features are formed by convertin...
متن کاملTemporal envelope compensation for robust phoneme recognition using modulation spectrum.
A robust feature extraction technique for phoneme recognition is proposed which is based on deriving modulation frequency components from the speech signal. The modulation frequency components are computed from syllable-length segments of sub-band temporal envelopes estimated using frequency domain linear prediction. Although the baseline features provide good performance in clean conditions, t...
متن کاملStatic and dynamic modulation spectrum for speech recognition
We present a feature extraction technique based on static and dynamic modulation spectrum derived from long-term envelopes in sub-bands. Estimation of the sub-band temporal envelopes is done using Frequency Domain Linear Prediction (FDLP). These sub-band envelopes are compressed with a static (logarithmic) and dynamic (adaptive loops) compression. The compressed sub-band envelopes are transform...
متن کاملModulation frequency features for phoneme recognition in noisy speech.
In this letter, a new feature extraction technique based on modulation spectrum derived from syllable-length segments of subband temporal envelopes is proposed. These subband envelopes are derived from autoregressive modeling of Hilbert envelopes of the signal in critical bands, processed by both a static (logarithmic) and a dynamic (adaptive loops) compression. These features are then used for...
متن کامل